A PLSA-based language model for conversational telephone speech

نویسندگان

  • David Mrva
  • Philip C. Woodland
چکیده

This paper describes experimentswith a PLSA-based language model for conversational telephone speech. This model uses a long-range history and exploits topic information in the test text to adjust probabilities of test words. The PLSA-based model was found to lower test set perplexity over a traditional word+class-based -gram by 13% (optimistic estimate using a reference transcript as history) or by 6% (realistic estimate using recognised transcript as history). Moreover, this paper introduces a use of confidence scores to weight words in the history, a weight of the prior topic distribution and a way of calculating perplexity that accounts for recognition errors in the model context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Bigram-PLSA Language Model for Speech Recognition

A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA) is introduced for language modeling. The motivation behind this idea is the relaxation of the “bag of words” assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to...

متن کامل

Improving English Conversational Telephone Speech Recognition

The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 ho...

متن کامل

Recognizing Call-center Speech Using Models Trained from Other Domains

In this paper, we introduce a new conversational speech task – recognizing call-center speech – using data collected from Dragon’s own technical support line. We compare performance of models trained from conversational telephone speech (the Switchboard corpus) and models trained from predominantly read, microphone speech, and report on a series of experiments focusing on adapting the microphon...

متن کامل

Experiments for an approach to language identification with conversational telephone speech

This paper presents our recent work on language identi-cation research using conversational speech (the LDC Conversational Telephone Speech Database). The base-line system used in this study was developed recently ((4, 5]). It is based on language-dependent phone recognition and phonotactic constraints. The system was trained using monologue data and obtained an error rate of around 9% on a com...

متن کامل

Using Continuous Space Language Models for Conversational Speech Recognition

Language modeling for conversational speech suffers from the limited amount of available adequate training data. This paper describes a new approach that performs the estimation of the language model probabilities in a continuous space, allowing by these means smooth interpolation of unobserved n-grams. This continuous space language model is used during the last decoding pass of a state-of-the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004